前言
在資料分析前後都需要有視覺化的幫忙,將資料或模型的結果換一個方式來有效率地呈現其中的資訊,使其他人能更容易理解資料的模式、趨勢以及找出異常值。最基本的視覺化方式是利用統計圖表來呈現資料,例如長條圖(Bar Plot)、箱形圖(Box Plot)、直方圖(Histogram)與散步圖(Scatter Plot)等。今天將討論的Python套件Matplotlib是常被拿來進行視覺化的工具。
匯入Numpy與Matplotlib
import numpy as np
import matplotlib.pyplot as plt
基本語法
- 將資料放入函數
plt.plot()
畫圖,接著以plt.show()
呈現圖形
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276qlhkvWxToP.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276qlhkvWxToP.png)
- 想要圖形中有許多線條,可呼叫多次
plt.plot()
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276P00XDOBvNL.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276P00XDOBvNL.png)
x = np.linspace(0, 10, 100)
plt.subplot(2, 1, 1)
plt.plot(x, np.sin(x))
plt.subplot(2, 1, 2)
plt.plot(x, np.cos(x))
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276IyASS1EEkj.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276IyASS1EEkj.png)
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x), color = "red", linestyle = "dashed")
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276ekYIJYjUTG.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276ekYIJYjUTG.png)
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.xlim(2, 9)
plt.ylim(-2, 2)
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276IjljPhAMCO.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276IjljPhAMCO.png)
x = np.linspace(0, 10, 100)
plt.style.use("seaborn-whitegrid")
plt.plot(x, np.sin(x), "-")
plt.axis("equal")
plt.title("A Sine Curve", fontsize = 12)
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276uZomScFmYN.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276uZomScFmYN.png)
統計圖表
xpt = [1, 2, 3, 4, 5]
ypt = [1, 4, 7, 16, 25]
plt.xticks(xpt)
plt.text(3, 8, "Hi")
plt.scatter(xpt, ypt, s = 15, c = "r")
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276GiFpfdhwCu.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276GiFpfdhwCu.png)
votes = [150, 400, 380]
N = len(votes)
x = np.arange(N)
width = 0.35
plt.bar(x, votes, width)
plt.ylabel("The number of votes")
plt.title("The Election Results")
plt.xticks(x, ("James", "Peter", "Norton"))
plt.yticks(np.arange(0, 450, 30))
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276POVXMl7BNk.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276POVXMl7BNk.png)
data = np.random.randn(1000)
plt.hist(data, bins = 100, color = "m")
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276Zf17j7Fc1Y.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276Zf17j7Fc1Y.png)
sorts = ["Travel", "Entertainment", "Eduction", "Transporation", "Food"]
fee = [8000, 2000, 3000, 5000, 6000]
plt.pie(fee, labels = sorts, explode = (0, 0.3, 0, 0, 0), autopct = "%1.2f%%")
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276M2FdV2PA9N.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276M2FdV2PA9N.png)
幾個iris資料集的例子
urlprefix = 'https://vincentarelbundock.github.io/Rdatasets/csv/'
dataname = 'datasets/iris.csv'
iris = pd.read_csv(urlprefix + dataname)
iris = iris.drop("Unnamed: 0", 1)
iris.hist(bins = 15, figsize=(12,10))
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276o6smZgZELa.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276o6smZgZELa.png)
- 對角線為各個變數的直方圖,非對角線的部分為變數兩兩之間的散佈圖
from pandas.plotting import scatter_matrix
attributes = ["Sepal.Length", "Sepal.Width", "Petal.Length","Petal.Width"]
scatter_matrix(iris[attributes], figsize=(13, 8))
plt.show()
![https://ithelp.ithome.com.tw/upload/images/20220823/20151276XdfaVB66h1.png](https://ithelp.ithome.com.tw/upload/images/20220823/20151276XdfaVB66h1.png)
常用函數整理
函數名稱 |
說明 |
plot() |
繪製折線圖 |
scatter() |
繪製散佈圖 |
bar() |
繪製長條圖 |
hist() |
繪製直方圖 |
pie() |
繪製圓餅圖 |
函數名稱 |
說明 |
title(標題) |
設定圖表標題 |
axis() |
設定座標軸範圍 |
xlim(min, max) |
設定x軸範圍 |
ylim(min, max) |
設定y軸範圍 |
label(名稱) |
設定圖表標籤圖例 |
legend() |
設定座標圖例 |
xlabel(名稱) |
設定x軸名稱 |
ylabel(名稱) |
設定y軸名稱 |
xticks(刻度值) |
設定x軸刻度值 |
yticks(刻度值) |
設定y軸刻度值 |
tick_params() |
設定座標軸刻度大小及顏色 |
text() |
在指定位置輸出字串 |
show() |
顯示圖表 |